29 research outputs found
Distributed Machine Learning Framework: New Algorithms and Theoretical Foundation
Machine learning is gaining fresh momentum, and has helped us to enhance not only many industrial and professional processes but also our everyday living. The recent success of machine learning relies heavily on the surge of big data, big models, and big computing. However, inefficient algorithms restrict the applications of machine learning to big data mining tasks. In terms of big data, serious concerns, such as communication overhead and data privacy, should be rigorously addressed when we train models using large amounts of data located on multiple devices. In terms of the big model, it is still an underexplored research area if a model is too big to train on a single device. To address these challenging problems, this thesis is focusing on designing new large-scale machine learning models, efficiently optimizing and training methods for big data mining, and studying new discoveries in both theory and applications.
For the challenges raised by big data, we proposed several new asynchronous distributed stochastic gradient descent or coordinate descent methods for efficiently solving convex and non-convex problems. We also designed new large-batch training methods for deep learning models to reduce the computation time significantly with better generalization performance. For the challenges raised by the big model, We scaled up the deep learning models by parallelizing the layer-wise computations with a theoretical guarantee, which is the first algorithm breaking the lock of backpropagation such that the large model can be dramatically accelerated
Ego-Downward and Ambient Video based Person Location Association
Using an ego-centric camera to do localization and tracking is highly needed
for urban navigation and indoor assistive system when GPS is not available or
not accurate enough. The traditional hand-designed feature tracking and
estimation approach would fail without visible features. Recently, there are
several works exploring to use context features to do localization. However,
all of these suffer severe accuracy loss if given no visual context
information. To provide a possible solution to this problem, this paper
proposes a camera system with both ego-downward and third-static view to
perform localization and tracking in a learning approach. Besides, we also
proposed a novel action and motion verification model for cross-view
verification and localization. We performed comparative experiments based on
our collected dataset which considers the same dressing, gender, and background
diversity. Results indicate that the proposed model can achieve
improvement in accuracy performance. Eventually, we tested the model on
multi-people scenarios and obtained an average accuracy
Exploit Where Optimizer Explores via Residuals
In order to train the neural networks faster, many efforts have been devoted
to exploring a better solution trajectory, but few have been put into
exploiting the existing solution trajectory. To exploit the trajectory of
(momentum) stochastic gradient descent (SGD(m)) method, we propose a novel
method named SGD(m) with residuals (RSGD(m)), which leads to a performance
boost of both the convergence and generalization. Our new method can also be
applied to other optimizers such as ASGD and Adam. We provide theoretical
analysis to show that RSGD achieves a smaller growth rate of the generalization
error and the same (but empirically better) convergence rate compared with SGD.
Extensive deep learning experiments on image classification, language modeling
and graph convolutional neural networks show that the proposed algorithm is
faster than SGD(m)/Adam at the initial training stage, and similar to or better
than SGD(m) at the end of training with better generalization error